Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths
نویسندگان
چکیده
In this paper, we strive towards the development of efficient techniques in order to segment document pages resulting from the digitization of historical machine-printed sources. This kind of documents often suffer from low quality and local skew, several degradations due to the old printing matrix quality or ink diffusion, and exhibit complex and dense layout. To face these problems, we introduce the following innovative aspects: (i) use of a novel Adaptive Run Length Smoothing Algorithm (ARLSA) in order to face the problem of complex and dense document layout, (ii) detection of noisy areas and punctuation marks that are usual in historical machine-printed documents, (iii) detection of possible obstacles formed from background areas in order to separate neighboring text columns or text lines, and (iv) use of skeleton segmentation paths in order to isolate possible connected characters. Comparative experiments using several historical machine-printed documents prove the efficiency of the proposed technique. 2009 Elsevier B.V. All rights reserved.
منابع مشابه
Adaptive Segmentation with Optimal Window Length Scheme using Fractal Dimension and Wavelet Transform
In many signal processing applications, such as EEG analysis, the non-stationary signal is often required to be segmented into small epochs. This is accomplished by drawing the boundaries of signal at time instances where its statistical characteristics, such as amplitude and/or frequency, change. In the proposed method, the original signal is initially decomposed into signals with different fr...
متن کاملDocument Analysis System
This paper outlines the requirements and components for a proposed Document Analysis System, which assists a user in encoding printed documents for computer processing. Several critical functions have been investigated and the technical approaches are discussed. The first is the segmentation and classijication of digitized printed documents into regions of text and images. A nonlinear, run-leng...
متن کاملPersian Printed Document Analysis and Page Segmentation
This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...
متن کاملDocument Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملSignature Segmentation from Machine Printed Documents using Contextual Information
Abstract: Automatic signature segmentation from a printed document is a challenging task due to the nature of handwriting of the signatory, overlapping/touching of signature strokes with printed text, graphics, noise, etc. In this paper we propose an approach towards the problem of signature segmentation. The method first detects the signature blocks and then segments them from the document ima...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Image Vision Comput.
دوره 28 شماره
صفحات -
تاریخ انتشار 2010